Computers in Biology and Medicine
○ Elsevier BV
Preprints posted in the last 7 days, ranked by how well they match Computers in Biology and Medicine's content profile, based on 120 papers previously published here. The average preprint has a 0.15% match score for this journal, so anything above that is already an above-average fit.
Usuzaki, T.; Matsunbo, E.; Inamori, R.
Show abstract
Despite the remarkable progress of artificial intelligence represented by large language models, how AI technologies can contribute to the construction of evidence in evidence-based medicine (EBM) remains an overlooked issue. Now, we need an AI that can be compatible with EBM. In the present paper, we aim to propose an example analysis that may contribute to this approach using variable Vision Transformer.
Jiang, Q.; Ke, Y.; Sinisterra, L. G.; Elangovan, K.; Li, Z.; Yeo, K. K.; Jonathan, Y.; Ting, D. S. W.
Show abstract
Coronary artery disease is a leading cause of morbidity and mortality. Invasive coronary angiography is currently the gold standard in disease diagnosis. Several studies have attempted to use artificial intelligence (AI) to automate their interpretations with varying levels of success. However, most existing studies cannot generate detailed angiographic reports beyond simple classification or segmentation. This study aims to fine-tune and evaluate the performance of a Vision-Language Model (VLM) in coronary angiogram interpretation and report generation. Using twenty-thousand angiogram keyframes of 1987 patients collated across four unique datasets, we finetuned InternVL2-4B model with Low-Rank Adaptor weights that can perform stenosis detection, anatomy labelling, and report generation. The fine-tuned VLM achieved a precision of 0.56, recall of 0.64, and F1-score of 0.60 for stenosis detection. In anatomy segmentation, it attained a weighted precision of 0.50, recall of 0.43, and F1-score of 0.46, with higher scores in major vessel segments. Report generation integrating multiple angiographic projection views yielded an accuracy of 0.42, negative predictive value of 0.58 and specificity of 0.52. This study demonstrates the potential of using VLM to streamline angiogram interpretation to rapidly provide actionable information to guide management, support care in resource-limited settings, and audit the appropriateness of coronary interventions. AUTHOR SUMMARYCoronary artery disease has heavy disease burden worldwide and coronary angiogram is the gold standard imaging for its diagnosis. Interpreting these complex images and producing clinical reports require significant expertise and time. In this study, we fine-tuned and investigated an open-source VLM, InternVL2-4B, to interpret and report coronary angiogram images in key tasks including stenosis detection, anatomy identification, as well as full report generation. We also referenced the fine-tuned InternVL2-4B against state-of-the-art segmentation model, YOLOv8x, which was evaluated on the same test sets. We examined how machine learning metrics like the intersection over union score may not fully capture the clinical accuracy of model predictions and discussed the limitations of relying solely on these metrics for evaluating clinical AI systems. Although the model has not yet achieved expert-level interpretation, our results demonstrate the potential and feasibility of automating the reporting of coronary angiograms. Such systems could potentially assist cardiologists by improving reporting efficiency, highlightning lesions that may require review, and enabling automated calculations of clinical scores such as the SYNTAX score.
rani, a.; mishra, s.
Show abstract
Accurate histopathological differentiation between High-Grade Serous Carcinoma (HGSC) and Low-Grade Serous Carcinoma (LGSC) remains a critical yet challenging aspect of ovarian cancer diagnosis due to their similar morphology and different clinical outcomes. This study presents a deep learning framework that uses custom attention mechanisms, including the Convolutional Block Attention Module (CBAM), Squeeze-and-Excitation (SE) blocks, and a Differential Attention module within five CNN architectures for automated binary classification of ovarian cancer subtypes from H&E WSI patches. Although individual models achieved higher accuracy, the ensemble stacking framework with a shallow MLP meta-learner delivered the best overall performance, with a ROC-AUC of 0.9211, an accuracy of 0.85, and F1-scores of 0.84 and 0.85 across both subtypes. These findings demonstrate that attention-guided feature recalibration combined with ensemble stacking provides robust and clinically interpretable discrimination of ovarian carcinoma subtypes.
Ferguson, D. J.
Show abstract
BackgroundClinical pharmacists, trainees, and educators rely on multi-database literature retrieval and structured evidence synthesis to answer drug-information questions. Existing workflows require navigation across PubMed, DailyMed, LactMed, interaction checkers, and specialty guideline repositories with manual de-duplication, appraisal, and synthesis. Commercial platforms that integrate these functions are costly and often unavailable in community, rural, and international training contexts. ObjectiveThis report describes the architecture of AuditMed, a single-file, browser-based clinical evidence audit platform, and reports preliminary stress-test results against a complex multi-morbidity case corpus. AuditMed is intended for research and educational use and is not a substitute for clinical judgment or validated commercial clinical decision-support systems. MethodsAuditMed integrates nineteen free, publicly available clinical and biomedical application programming interfaces into a six-stage Search [->] Select [->] Parse [->] Analyze [->] Infer [->] Create pipeline and supports browser-local patient-case ingestion with regex-based HIPAA Safe Harbor de-identification. Preliminary stress-testing was conducted against eleven cases (Cases 30 through 40) from the Complex Clinical Case Compendium Software Validation Suite, each featuring over twenty concurrent active disease states. For each case, the one-click inference pipeline was executed with default settings and the full Clinical Inference Report was captured verbatim. No retrieval-sensitivity, synthesis-fidelity, or time-to-answer endpoints were pre-specified; the exercise was qualitative and oriented toward pipeline behavior under extreme multi-morbidity. ResultsThe pipeline completed without fatal errors for all eleven cases and produced a structured Clinical Inference Report in each instance. Quantitative-finding detection performed as designed for hematologic parameters and cardiac biomarkers. Two parser defects were identified and are reproduced in the appendix: an age-as-fever regex-precedence defect affecting seven cases and a diagnosis-versus-medication parsing defect affecting one case. Evidence-linkage rate varied from zero evidence-linked statements in seven cases to eleven in one case, reflecting dependence of the inference layer on MeSH-indexed literature coverage of the specific case diagnoses. ConclusionsAuditMed is an early-stage, open-source platform whose value at this stage is in providing a free, transparent, auditable workflow for multi-source evidence synthesis with explicit uncertainty flagging. The preliminary results document both robust end-to-end completion under extreme case complexity and specific, reproducible parser defects that will be addressed before formal evaluation. Planned evaluation studies are described.
El Bab, M.; Guvenis, A.
Show abstract
Conflicting evidence on scatter correction (SC) methods plagues quantitative myocardial perfusion SPECT (MPI), hindering standardized clinical protocols. This simulation study, utilizing the SIMIND Monte Carlo program and a highly realistic 4D XCAT phantom, systematically evaluates Dual Energy Window (DEW, with k=0.5) and Triple Energy Window (TEW) SC techniques. We uniquely investigate their performance across various photopeak window widths (2, 4, and 6 keV) and novel overlapped/non overlapped configurations specifically for Tc 99m MPI parameters largely unexplored in realistic cardiac models. Images were reconstructed with OSEM under uncorrected (UC), SC, and combined attenuation and scatter corrected (ACSC) conditions. Quantitative analysis focused on signal to noise ratio (SNR), contrast to noise ratio (CNR), defect contrast, and relative noise to background (RNB). Our findings consistently show ACSC's superior performance in CNR, SNR, and defect contrast, confirming its critical role. Interestingly, SC alone reduced noise but compromised defect contrast relative to UC, highlighting a potential trade-off without attenuation correction. Crucially, this study reveals minimal influence of photopeak window width and overlap configuration on image quality, and no significant difference between DEW and TEW across most metrics. These results provide essential evidence for optimizing quantitative MPI protocols, suggesting that for Tc 99m, the choice between DEW and TEW, and specific window settings, may be less critical than ensuring robust attenuation correction.
Pitti, L.; Sitti, G.; Candia-Rivera, D.
Show abstract
Parkinson's Disease (PD) is a complex neurodegenerative disorder that manifests through systemic, large-scale physiological reorganizations. While research often focuses on region-specific neural changes, there is a growing need for multidomain approaches to capture the complexity of the disease and its clinical heterogeneity. This study proposes an analytical pipeline to evaluate Brain-Heart Interplay (BHI) as a novel systemic biomarker for neurodegeneration and healthy ageing. In this study we assessed BHI across three open-source datasets (EEG and ECG signals). We compared Healthy Young, Healthy Elderly, and PD patients in resting state to investigate the effects of ageing and cognitive performance. Additionally, we studied BHI trends in PD patients in the moment of freezing of gait (FOG). Methodologically, brain network organization was quantified using coherence-based EEG connectivity and graph theory, while heart activity was analyzed through Poincare plot-derived measures of cardiac autonomic activity. The coupling between these two systems was measured using the Maximal Information Coefficient to capture linear and non-linear dependencies between global cortical organization and cardiac autonomic outflow. The results demonstrate that BHI is a sensitive biomarker for detecting early multisystem dysfunction in both neurodegeneration and ageing. Furthermore, the identification of specific BHI trends during FOG onset suggests new opportunities for understanding the physiological mechanisms driving motor complications in PD. Our proposed pipeline provides a guiding tool for large-scale physiological assessment in clinical research.
Goetz, C.; Eichenlaub, M.; Schmidt, K.; Wiedmann, F.; Invers Rubio, E.; Martinez Diaz, P.; Luik, A.; Althoff, T.; Schmidt, C.; Loewe, A.
Show abstract
The recently published EHRA/EACVI consensus statement on a standardized bi-atrial regionalization provides new opportunities for consistent regional analyses across patients, imaging modalities and clinical centers. To make this standardized regionalization widely accessible, we developed the open-source software DIVAID, which automatically divides bi-atrial geometries according to the proposed regions, ensuring consistency, reproducibility and operator independence. We evaluated the accuracy of the algorithm by comparing its results to manual expert annotations across 140 geometries from multiple modalities and centers. Veins were automatically clipped correctly in 81% and orifices annotated correctly in 100% of cases. The median (interquartile range; IQR) Dice similarity coefficient (DSC) for left atrial regions was 0.98 (0.96-1.00) for DIVAID-expert and 0.98 (0.94-1.00) for inter-expert comparisons. For right atrial geometries, DSC was higher for DIVAID-expert than for inter-expert comparisons at 0.90 (0.80-0.95) and 0.88 (0.74-0.94), respectively. To assess the accuracy of regional boundaries, we computed the mean average surface distance (MASD) for boundaries derived from automatic or manual annotations. The median (IQR) MASD between DIVAID and experts was 0.17 mm (0.03-0.78) and 1.93 mm (0.65-3.96) in the left and right atrium, respectively. To conclude, DIVAID robustly divides anatomically diverse bi-atrial geometries according to the 15-segment model, while outperforming cardiac experts in both speed and consistency, and demonstrating an accuracy of regional boundaries comparable to the spatial resolution of cardiac imaging modalities. By providing automated, consistent atrial regionalization, DIVAID enables large-scale, standardized regional analyses and data-driven investigation of harmonized, multi-dimensional datasets, which may advance atrial arrhythmia research and personalized treatment strategies.
Crystal, O.; Farina, J. M. M.; Scalia, I. G.; Ayoub, C.; Park, H.-B.; Kim, K. A.; Arsanjani, R.; Lester, S. J.; Banerjee, I.
Show abstract
BackgroundAccurate assessment of left ventricular outflow tract (LVOT) gradients is critical for hypertrophic cardiomyopathy (HCM) management, yet Doppler-based measurements are technically demanding and require expertise. ObjectiveTo develop a multi-view deep learning model capable of classifying LVOT obstruction (> 20mmHg) using routine 2D echocardiographic windows without reliance on Doppler imaging. MethodsWe trained and externally validated a cross-attention-based video-to-video fusion framework that integrated EchoPrime-derived video representations from three standard transthoracic echocardiographic views to classify LVOT gradients. ResultsTraining was performed on a derivation cohort (N = 1833) from a tertiary care system in the United States, with model performance evaluated on an internal held-out test set (N = 275) and a Korean external validation cohort (N = 46). Single-view baselines showed limited discrimination (external AUROCs 0.47-0.70). Conversely, domain-specific foundational model (EchoPrime) achieved superior single-view performance (AUROCs 0.75-0.80 internal; 0.79-0.83 external), highlighting the importance of echo-specific pretraining and temporal modeling. The proposed multi-view fusion further enhanced predictive performance, with the late fusion model reaching an AUROC of 0.84 on the external cohort with significant population-shift. ConclusionsThese results suggest LVOT physiology is encoded in routine 2D imaging and can be leveraged for clinically relevant gradient classification without Doppler input- proposed AI-guided strategy demonstrates substantial cost savings compared with the screen-all approach. By integrating complementary spatial-temporal information across multiple views, our approach generalizes robustly across populations and may enable real-time decision support, extend LVOT assessment to portable or resource-limited settings, and complement Doppler-based evaluation for longitudinal HCM management.
Chen, Z.; Hu, T.; Haddadin, S.; Franklin, D.
Show abstract
There is more to musculotendon path modeling than aligning a cable to reflect the geometric features of a muscle-tendon unit. From the perspective of simulation accuracy, the key is to replicate the length- and moment arm-joint angle relations of the target muscle. In this study, we propose an effect-oriented approach of automated path modeling, via the hybrid calibration based on muscle surface mesh and moment arm. The task is formulated as an optimization problem with a threefold objective for the path to: 1) pass through multiple ellipses representing muscle cross-sections, 2) yield moment arms that match experimental measurements, and 3) yield moment arms with the designated signs. The performance of our optimization framework is demonstrated with the musculoskeletal surface mesh from the Visible Human Male and moment arm datasets from literature--producing 42 paths that are anatomically realistic and biomechanically accurate in 20.1 min. Our optimization framework is gradient-specified, which is faster and more accurate than using the default numerical gradient, making it applicable for large-scale subject-specific uses.
Deng, F.; Li, H.; Sun, D.; Duan, G.; Sun, Z.; Xue, G.
Show abstract
High level of protein expression is usually welcomed in industry and research, and codon optimization is widely used to achieve high expression. Methods of implementing codon optimization can be divided into two branches, one is classical methods which develop cost functions based on empirical law, another is AI methods which learn the codon choice principles from endogenous genes with neural networks. Here we develop two codon optimization tools based on two branches respectively, namely OptimWiz 2.1 and OptimWiz 3.0. Results of fusion protein fluorescence detection indicate that both OptimWiz 2.1 and OptimWiz 3.0 are superior to all the other commercially available codon optimization tools. Principles of codon optimization are revealed in the process of machine learning on both tools.
Li, L. Y.; Lebiecka-Johansen, B.; Byberg, S.; Thambawita, V.; Hulman, A.
Show abstract
Diabetic retinopathy (DR) is a leading cause of vision impairment, requiring accurate and scalable diagnostic tools. Foundation models are increasingly applied to clinical imaging, but concerns remain about their calibration. We evaluated DINOv3, RETFound, and VisionFM for DR classification using different transfer learning strategies in BRSET (n = 16,266) and mBRSET (n = 5,164). Models achieved high discrimination in binary classification (normal vs retinopathy) in BRSET (AUROC 0.90-0.98), with DINOv3 achieving the best under full fine-tuning (AUROC 0.98 [95% CI: 0.97-0.99]). External validation on mBRSET showed decreased performance for all models regardless of the fine-tuning strategy (AUROC 0.70-0.85), though fine-tuning improved performance. Foundation models achieved strong discrimination but poor calibration, generally overestimating DR risk. While the generalist model, DINOv3, benefited from deeper fine-tuning, miscalibration remained evident. These findings underscore the need to improve calibration and the comprehensive evaluation of foundation models, which are essential in clinical settings. Author summaryArtificial intelligence is increasingly being used to detect eye diseases such as diabetic retinopathy from retinal images. Recent advances have introduced "foundation models," which are trained on large datasets and can be adapted to new tasks. We aimed to evaluate how well these models perform in a clinical prediction context, with a focus not only on accuracy but also on how reliably they estimate disease risk. In this study, we compared different types of foundation models using two independent datasets from Brazil. We found that while these models were generally good at distinguishing between healthy and diseased eyes, their predicted risks were often poorly calibrated. In other words, the estimated probabilities did not consistently reflect the true likelihood of disease. We also examined whether adapting the models to the target population could improve performance. Although this approach led to improvements, calibration issues remained. However, post-training correction improved the agreement between predicted risks and observed outcomes. Our findings highlight an important gap between model performance and clinical usefulness. We suggest that improving the reliability of risk estimates is essential before such systems can be safely used in healthcare.
Musonda, R.; Ito, K.; Omori, R.; Ito, K.
Show abstract
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has continuously evolved since its emergence in the human population in 2019. As of 1st August 2025, more than 1,700 Omicron subvariants have been designated by the Pango nomenclature system. The Pango nomenclature system designates a new lineage based on genetic and epidemiological information of SARS-CoV-2 strains. However, there is a possibility that strains that have similar genetic backgrounds and the same phenotype are given different Pango lineage names. In this paper, we propose a new algorithm, called FindPart-w, which can identify groups of viral lineages that share the same relative effective reproduction numbers. We introduced a new lineage replacement model, called the constrained RelRe model, which constrains groups of lineages to have the same relative effective reproduction numbers. The FindPart-w algorithm searches the equality constraints that minimise the Akaike Information Criterion of constrained RelRe models. Using hypothetical observation count data created by simulation, we found that the FindPart-w algorithm can identify groups of lineages having the same relative effective reproduction number in a practical computational time. Applying FindPart-w to actual real-world data of time-stamped lineage counts from the United States, we found that the Pango lineage nomenclature system may have given different lineage names to SARS-CoV-2 strains even if they have the same relative effective reproduction number and similar genetic backgrounds. In conclusion, this study showed that viruses that had the same relative effective reproduction number were identifiable from temporal count data of viral sequences. These findings will contribute to the future development of lineage designation systems that consider both genetic backgrounds and transmissibilities of lineages.
Tokodi, M.; Kagiyama, N.; Pandey, A.; Nakamura, Y.; Akama, Y.; Takamatsu, S.; Toki, M.; Kitai, T.; Okada, T.; Lam, C. S.; Yanamala, N.; Sengupta, P.
Show abstract
Backgound: Accurate assessment of diastolic function and left ventricular (LV) filling pressure is central to heart failure diagnosis and risk stratification. Contemporary guideline algorithms rely on complex parameters that are not consistently available in routine clinical practice. Objective: To compare the diagnostic and prognostic performance of the 2016 American Society of Echocardiography/European Association of Cardiovascular Imaging (ASE/EACVI) and 2025 ASE guidelines with a deep learning model based on routinely acquired echocardiographic variables. Methods: This study evaluated the guideline-based algorithms and a deep learning model in participants from the Atherosclerosis Risk in Communities (ARIC) cohort (n=5450) for prognostication and two invasive hemodynamic validation cohorts from the United States (n=83) and Japan (n=130) for detection of elevated left ventricular filling pressure. Results: In the ARIC cohort, the deep learning model demonstrated superior prognostic performance compared with the 2016 and 2025 guidelines (C-index: 0.676 vs. 0.638 and 0.602, respectively; both p<0.001). Similar findings were observed among participants with preserved ejection fraction (C-index: 0.660 vs. 0.628 and 0.590; both p<0.001), with improved performance compared with the H2FPEF score (C-index: 0.660 vs. 0.607; p<0.001). In the US hemodynamic validation cohort, the deep learning model showed higher diagnostic performance than the 2025 guidelines (AUC: 0.879 vs. 0.822; p=0.041) and similar performance compared with the 2016 guidelines (AUC: 0.879 vs. 0.812; p=0.138). In the Japanese hemodynamic validation cohort, the deep learning model outperformed both guidelines (AUC: 0.816 vs. 0.634 and 0.694; both p<0.05). Conclusions: A deep learning model leveraging routinely available echocardiographic parameters demonstrated improved diagnostic and prognostic performance compared with contemporary guideline-based approaches, potentially offering a scalable alternative for assessing diastolic function and left ventricular filling pressures.
Piorkowska, N. J.; Olejnik, A.; Ostromecki, A.; Kuliczkowski, W.; Mysiak, A.; Bil-Lula, I.
Show abstract
Interpreting machine learning models typically relies on feature attribution methods that quantify the contribution of individual variables to model predictions. However, it remains unclear whether attribution magnitude reflects the true functional importance of features for model performance. Here, we present a unified interpretability framework integrating permutation-based attribution, feature ablation, and stability under perturbation across multiple feature spaces. Using nested cross-validation and permutation-based null diagnostics, we systematically evaluate the relationship between attribution magnitude and functional dependence in clinical and biomarker-based prediction models. Attribution magnitude is frequently misaligned with functional importance, with weak to strong negative correlations observed across feature spaces (Spearman {rho} ranging from -0.374 to -0.917). Features with high attribution often have limited impact on model performance when removed, whereas features with low attribution can be essential for maintaining predictive accuracy. These discrepancies define distinct classes of interpretability failure, including attribution excess and latent dependence. Interpretability further depends on feature space composition, and stable, functionally relevant features are not necessarily those with the highest attribution scores. By integrating attribution, functional impact, and stability into a composite Feature Reliability Score, we identify features that remain informative across perturbations and analytical contexts. These findings indicate that interpretability does not arise from attribution magnitude alone but is better characterized from stability under perturbation. This framework provides a basis for more robust model interpretation and highlights limitations of attribution-centric approaches in high-dimensional and correlated data settings.
Walters, R.; Allen, M. B.; Scheen, H.; Beam, C.; Waldrip, Z.; Singule-Kollisch, M.; Varisco, A.; Williams, J. G.; De Luca, D.; Varisco, B. M.
Show abstract
BackgroundIn patients requiring respiratory support, clinicians rely on physical exam, radiologic, laboratory, and ventilator-derived measures for the provision of sufficient support while minimizing ventilator and "work of breathing" induced lung injury. Point of care lung ultrasound (LUS) is a widely available tool in hospital and clinic environments. To date, LUS has not been used to evaluate lung strain. MethodsWe collected LUS images in four anesthetized, neuromuscularly blocked, and mechanically ventilated pigs being used for another experiment. A feature tracking tool was developed which tracked echo-bright lung structures in ten second clips obtained in triplicate of the right and left, upper and lower lung fields using tidal volumes of 4, 6, 8, 10, and 12 mL/kg. Pleural lines were manually drawn and a program for quantifying lung strain developed with assistance from Anthropic Claude Artificial Intelligence tool. Structures were identified in inspiratory and expiratory frames and tracked bidirectionally with median strain per frame used for calculations. ResultsTriplicate measures of lung ultrasound images in four pigs had a median coefficients of variation of 35% (23-47% IQR) and linear modeling of strain with tidal volumes of 4-12 mL/kg showed positive correlation with R2 value ranging from 0.89 to 0.97. Strain measurements were similar after bronchial administration of 1.5M hydrochloric acid. ConclusionsRegional lung strain quantification using LUS is a viable and potentially useful tool for respiratory support management.
Swee, S.; Adam, I.; Zheng, E. Y.; Ji, E.; Wang, D.; Speier, W.; Hsu, J.; Chang, K.-W.; Shivkumar, K.; Ping, P.
Show abstract
Ambulatory electrocardiograms (ECG) provides continuous monitoring of the hearts electrical activity. However, many existing machine learning and artificial intelligence models for analyzing ambulatory ECG traces are often unimodal and do not incorporate patient clinical context. In this study, we propose a multimodal framework integrating ambulatory ECG-derived representations with clinical text embeddings to predict two cardiac outcomes: sudden cardiac death and pump failure death. Ambulatory ECG traces are preprocessed, segmented, and encoded via a multiple instance learning and temporal convolutional neural network framework. In parallel, patient clinical features are parsed into structured prompts, which are passed through a large language model to generate clinical reasoning; this reasoning passes through a biomedical language encoder to generate a text embedding. With the ECG and text embeddings, we systematically evaluate multiple fusion strategies, including concatenation- and gating-based approaches, to integrate these two data modalities. Our results demonstrate that multimodal models consistently outperform unimodal baselines, with adaptive fusion mechanisms providing the greatest improvements in predictive performance. Decision curve analysis highlights the potential clinical utility of the proposed framework for risk stratification. Finally, we visualize model attention across modalities, including ECG attention patterns, segment-level saliency, heart rate variability features, and clinical reasoning, to contextualize patient-specific predictions.
Giri, R.; Agrawal, R.; Lamichhane, S. R.; Barma, S.; Mahatara, R.
Show abstract
We are pleased to submit our Original article entitled "Assessing medication-related burden and medication adherence among older patients from Central Nepal: A machine learning approach" for consideration in your esteemed journal. In this paper, we assessed medication burden using validated Living with medicines Questionnaire (LMQ-3) and medication adherence using Adherence to Medication refills (ARMS) Scale. In this paper we analysed our result through machine learning approach in spite of traditional statistical approach to identify the complex factors influencing both. Six ML architectures (Ordinary Least Square, LightGBM, Random Forest, XGBoost, SVM, and Penalized linear regression) were employed to predict ARMS and LMQ scores using various socio-demographic, clinical and medication-related predictive features. Model explainability was provided through SHAP (Shapley Additive exPlanations). Our study identified the moderate medication burden with moderate non-adherence among older adults. Requiring assistance for medication and polypharmacy were the strongest drivers for the medication burden and non-adherence. The high predictive accuracy by ML suggests the appropriate clinical intervention like deprescribing to cope with the high prevalent medication burden and non-adherence among older adults in Nepal.
Matthewman, J.; Denaxas, S.; Langan, S.; Painter, J. L.; Bate, A.
Show abstract
Objectives: Large language models (LLMs) have shown promise in creating clinical codelists for research purposes, a time-consuming task requiring expert domain knowledge. Here, we evaluate the performance and assess failure modes of a retrieval augmented generation (RAG) approach to creating clinical codelists for the large and complex medical terminology used by the Clinical Practice Research Datalink (CPRD). Materials & Methods: We set up a RAG system using a database of word embeddings of the medical terminology that we created using a general-purpose word embedding model (gemini-embedding). We developed 7 reference codelists presenting different challenges and tagged required and optional codes. We ran 168 evaluations (7 codelists, 2 different database subsets, 4 models, 3 epochs each). Scoring was based on the omission of required codes, and inclusion of irrelevant codes. We used model-grading (i.e., grading by another LLM with the reference codelists provided as context) to evaluate the output codelists (a score of 0% being all incorrect and 100% being all correct). Results: We saw varying accuracy across models and codelists, with Gemini 3 Pro (Score 43%) generally performing better than Claude Sonnet 4.6 (36%), Gemini 3 Flash, and OpenAI GPT 5.2 performing worst (14%). Models performed better with shorter target codelists (e.g., Eosinophilic esophagitis with four codes, and Hidradenitis suppurativa with 14 codes). For example, all models consistently failed to produce a complete Wrist fracture codelist (with 214 required codes). We further present evaluation summaries, and failure mode evaluations produced by parsing LLM chat logs. Discussion: Besides demonstrating that a single-shot RAG approach is currently not suitable for codelist generation, we demonstrate failure modes including hallucinations, retrieval failures and generation failures where retrieved codes are not used. Conclusions: Our findings suggest that while RAG systems using current frontier LLMs may create correct clinical codelists in some cases, they still struggle with large and complex terminologies and codelists with a large number of codes. The failure mode we highlight can inform the creation of future workflows to avoid failures.
Anantha Krishnan, A.; Dinning, P. G.; Holland, M. A.
Show abstract
PurposeColonic motility disorders, including diarrhea-predominant irritable bowel syndrome and slow-transit constipation, impose a major clinical burden. Although high-resolution colonic manometry reveals characteristic spatiotemporal motor patterns, such as high-amplitude propagating contractions and cyclic motor pattern in healthy individuals, these patterns are often altered or absent in disease. Understanding how these patterns arise from underlying pacemaker, neural, and mechanical mechanisms is essential for improving treatment strategies. MethodsWe developed a biophysical whole-colon model that integrates an Interstitial Cells of Cajal-inspired oscillator network, enteric nervous system reflexes, a pressure-gated modulation element motivated by rectosigmoid brake behavior, and a nonlinear tube law describing colon wall mechanics. The model simulates spatiotemporal pressure patterns along the colon and allows systematic variation of physiological parameters associated with pacemaker activity, neural reflex control, and distal gating. ResultsA small set of parameters reproduces three illustrative motility patterns corresponding to healthy motility, diarrhea-predominant irritable bowel syndrome, and slow-transit constipation. The simulated pressure maps recapitulate key features observed in high-resolution manometry, including propagation direction, regional patterning of contractions, and case-specific changes in amplitude and coordination. Sensitivity analysis suggests that proximal excitation strength and waveform morphology strongly influence global motility metrics. ConclusionOur study presents a simple, biophysical framework for reproducing clinically observed colonic motor patterns and exploring their disruption in disease. More broadly, the model may help interpret clinical manometry in mechanistic terms and support hypothesis-driven in silico studies of colonic motility disorders.
Kheirbakhsh, R.; Mathur, P.; Lawlor, A.
Show abstract
Multimodal machine learning leverages complementary information from diverse data sources and has shown strong promise in medical imaging, where multimodal data is critical for clinical decision making. In glioma grading, integrating MRI modalities with clinical data can improve diagnostic accuracy, yet systematic comparisons of fusion strategies remain limited. This study evaluates early, intermediate, and late fusion approaches, addressing the question: How does the inclusion of clinical data alongside MRI modalities influence grading performance? To assess modality contributions, we design adaptable fusion layers and employ interpretability techniques, including attention-based analysis. Our results show that incorporating clinical data consistently outperforms unimodal and MRI-only baselines, with intermediate fusion yielding the most reliable gains. Beyond accuracy, the framework reveals how MRI and clinical features jointly shape predictions, underscoring the importance of both fusion design and interpretability for clinical adoption.